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Background / Context: 

We report on a rigorous, multidisciplinary investigation of the heavily utilized Physics 
Diagnoser formative assessment system. Our focus is on multiple aspects of diagnostic, 
instructional and content validity of the system and assessments relative to their use in science 
classrooms. 

Facet-based assessments are one innovative approach to helping teachers diagnose students’ 
science understanding (Minstrell, 2001; Minstrell, Anderson, Kraus, & Minstrell, 2008). The 
facets perspective assumes that students’ understandings possess some strengths to build on, 
possibly in addition to problematic thinking that can be revised through additional learning 
opportunities. The term “facets” acknowledges that not all students’ thinking can be considered 
“misconceptions” or errors. Facet clusters serve as the interpretive framework for analyzing 
student responses to questions and for designing instructional activities to promote learning. 

Despite the designed- in multidimensionality of facet-based assessments, the preponderance 
of psychometric analyses performed so far have failed to capture the richness of the evidence 
about what students know and how they know it. Standard classical test theory or unidimensional 
approaches can be useful for capturing some critical measurement properties of items and 
instruments [such as gross item indices of “difficulty,” biserial correlations with total score, and 
unidimensional indices of total score reliability including KR20 and Cronbach’s alpha 
(Cronbach, 1951; Lord & Novick, 1968; Allen & Yen, 2002; van der Linden & Hambleton, 
1997)], but are not designed to reflect the facet-based multidimensional richness of the data. 

In light of the rich conceptual and cognitive model guiding item development and data 
collection, the failure to use more powerful measurement models means that the linkage from 
observation to interpretation and from interpretation back to cognition is only of the most 
rudimentary form. The work reported here provides a strong interpretive framework supported 
by sophisticated psychometric techniques as a way of capturing the diagnostic power of the 
instrument, and enhancing its usefulness as a formative assessment tool. Our approach provides 
insight about the potential of the facet-based approach to offer a clear and transparent articulation 
of the linkage between the assumptions about cognition and observed student performance. 

Purpose / Objective / Research Question / Focus of Study: 

The research design and team constitute a multidisciplinary attack on problems of 
educational and assessment design in physics instruction. Components of the research include: 
(a) an Evidence-Centered Design analysis of Diagnoser instructional materials and assessments 
that provides a view of the evidentiary coherence of the existing system; (b) an alignment study 
of the Diagnoser system with multiple standards frameworks that describes deep connections 
among existing standards frameworks and the Diagnoser system and illuminates how alignment 
approaches can simultaneously inform all aspects of a formative assessment system; (c) the 
application of sophisticated psychometric models to the existing data that provides statistical 
evidence for inferential claims that support classroom use of the Diagnoser system; and (d) the 
identification of cases in which new or improved Diagnoser question sets can be developed and 
tested with students. This paper focuses on methodologies associated with the psychometric 
analyses of the Diagnoser question sets and the alignment study. 

Force and Motion content was targeted in this project for two reasons. First, extensive 
research has been conducted on misconceptions in force and motion (e.g.. Driver et al., 1994; 
Hashweh, 1988; Lythcott, 1985; Wandersee et al., 1994) and by staff of several related 
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Diagnoser projects, providing a substantial research basis for the design of the instructional 
materials and assessments. Second, Force and Motion clusters are among the earliest developed 
as part of the online Physics Diagnoser and have been used continually since 2004 during which 
has been available via the internet, providing a large database of student performances for 
analysis. 

Setting: 

The study focuses on secondary school level physical science and physics instruction and 
assessment in Forces and Motion. The content is organized into three key concept strands 
(Description of Motion, Nature of Forces, and Forces to Explain Motion) and seventeen 
conceptual facet clusters, and includes more than thirty facet-based question sets. The list of 
facet clusters associated with each strand in Table 1 specifies the scope of the content. All 
instructional and assessment materials are available on the Diagnoser formative assessment 
system (www.Diagnoser.com) . 

Population / Participants / Subjects: 

Since the launch of the present version of Diagnoser in September 2004, about 4000 teachers 
have registered to use the system. Nearly 300,000 question sets (3 million items) have been 
completed by typical middle school and high school physical science and physics students during 
the period 2004-2009. Roughly half these data will be the focus of this study. 

Intervention / Program / Practice: 

Facet clusters are presented online to students and teachers via www.Diagnoser.com . Figure 
1 shows the Forces as Interactions cluster as an example. Diagnoser includes questions to elicit 
student preconceptions (elicitation questions), lessons to allow students to test ideas and 
understandings (developmental lessons), sets of facet-based assessment items (question sets), and 
activities to address persistent problematic understandings (prescriptive activities). Each question 
set is composed of 6 to 12 items, and includes multiple-choice, numerical response, and open- 
ended text formats. Figure 2 presents one question set associated with the Forces as Interactions 
cluster. Teachers receive real-time results from question sets in reports describing the facets 
inferred for each student. 

Research Design: 

Psychometric analyses will include baseline unidimensional analyses, diagnostic modeling 
that includes new approaches for taking into account multiple choice distractors linked to 
specific facets, and direct statistical analyses of a students’ patterns of inferred facets within a 
question set. We are computing a facet prevalence value for each student, each question set and 
each facet based on how many times that facet response was inferred for that student compared 
to the set of all inferred facets. We will compare students’ facet prevalence values and their 
patterns to students’ diagnostic model-computed facet profiles. We report on a comparison of 
students’ performances at middle school and high school levels within and between clusters. To 
ensure that estimates of standard error, effect sizes, and statistical significance are not artificially 
inflated or deflated, we are accounting for clustering of students within classes by performing 
Hierarchical Linear Modeling (Raudenbush & Bryk, 2002) analyses. 

The Alignment Study will evaluate the content and diagnostic validity of the Diagnoser 
system, with an emphasis on an analysis of facet clusters and question sets, as shown in Table 2. 
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Three panelists with expertise in physics content and science assessment will rate the alignment 
of facet clusters to standards frameworks and also rate the diagnostic capabilities of the question 
sets. All ratings will be completed independently. 

Data Collection and Analysis: 

Diagnostic Question Set Data. Existing Physics Diagnoser data available for analysis have 
been collected through the online website http://www.Diagnoser.com/ . The primary student 
outcomes are the “facet scores” from a student’s question set responses, which are a sequence of 
inferred facets derived from a student’s multiple-choice responses. We summarize and represent 
students’ facet scores for a given question set in three ways: (1) prevalence across the question 
set of inferred facets; which facets occurred and how often; (2) patterns of inferred facets; and 
(3) degree of consistency within a given student’s facet pattern. 

Advanced psychometric analyses focus on two main objectives: (1) to summarize the 
observed ranges and patterns of performance of a variety of groups of students in the question 
sets associated with each cluster; and (2) to provide a definitive evaluation of the diagnostic 
capacity of the question sets with respect to inferring students’ facet profiles. Previous model- 
based psychometric studies of facet-based assessments have had limited success in capturing 
students’ observed performance on items in ways that were aligned with the cluster and facets 
structure (Wilson 1992, 2008; Steedle & Shavelson, 2009; Steedle, 2008; Scalise, Madhyastha, 
Minstrell, and Wilson, 2010). We will report on a fresh set of approaches that take into account 
structural and design characteristics of the facets assessments using multidimensional diagnostic 
models. Highly promising diagnostic model-based analyses have been completed of another 
type of misconception-based assessment called concept inventories (Santiago-Roman, 2009; 
Santiago-Roman et. al., in preparation). In that work a set of “skills” for diagnostic measurement 
and reporting was shown to be highly predictive of student performance. We are carrying out 
similar diagnostic analyses for facet-based assessments, guided by findings from the alignment 
analyses and ECD analyses, and we are using a constrained latent class model for cognitive 
diagnosis called the Fusion Model and Markov Chain Monte Carlo model calibration methods 
(DiBello, Roussos & Stout, 2007; DiBello & Stout, 2003; DiBello & Stout 2007), as well as a 
partial credit model that allows for two possible responses to have the same ordered score level 
to scale the response data (Wilson 1992, 2008). New diagnostic models are being applied that 
take account of facet information linked to multiple choice options (DiBello, Stout & Henson, in 
press; also see de la Torre 2009). 

Alignment Study. In contrast to typical alignment studies, this study will provide more 
substantial evidence about the diagnostic capabilities of the Diagnoser formative assessment 
system. Panelists will judge each facet cluster’s alignment to three frameworks (AAAS 
Benchmarks for Scientific Literacy, College Board Science Standards, and the core and 
component ideas in the NRC’s New Framework for Science Education Standards). Each panelist 
first will make a judgment about the degree to which the goal facets in each cluster address the 
content in the standards using a 4-point Likert scale (1 = Fully aligned to 4 = Not aligned). 
Particularly relevant to evaluating the diagnostic capabilities of the system, panelists also will: 

(a) judge the extent to which the problematic facets in each cluster reflect common 
misconceptions supported in the literature using a 5-point scale (1 = Very good coverage to 5 = 
Very poor coverage); and (b) confirm that problematic facets in each cluster are appropriately 
ranked from less to more problematic. 
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For each item in Diagnoser question sets, panelists will evaluate the extent to which response 
options exemplify the facets as they are coded. Panelists also will indicate which scientific 
practices (e.g., developing evidence-based models and explanations, reasoning about the 
relationships between variables) are elicited by associating questions with performance 
expectations in the College Board Standards. In addition, panelists will judge the degree to which 
the assessment tasks cover a range of representations and contexts needed to assess the content in 
the facet clusters. Because the data from question sets are critical for helping teachers make 
instructional decisions, panelists will indicate how effectively the reports: (a) communicate what 
students know and can do and (b) inform next instructional steps using a 4-point rating scales 
(Not at all effective/useful to Very effective/useful) and open-ended text responses. 

Finally, for each question set panelists will review three common pathways through the 
question sets (based on actual student data), as not all students complete the same questions or 
questions in the same sequence. For each pathway panelists will evaluate whether questions are 
presented in a logical, conceptually appropriate for use by teachers to diagnose and remediate the 
misconceptions the students hold. Panelists will respond using a 4-point scale ranging from 
strongly agree to strongly disagree. 

For each question in the alignment protocol, agreement will be calculated on judgments 
between panelists. If possible, discrepant ratings will be resolved by consensus. 

Findings / Results: 

Preliminary findings from analyses of diagnostic data and the alignment study are expected 
by August. Findings from psychometric analyses are expected to provide statistical evidence for 
degree of consistency between students’ answer choices and inferred facet understandings. The 
analyses will include a comprehensive statistical description of facet patterns as they vary across 
groups of students and across clusters. A set of diagnostic inferential characteristics will be 
computed as measures of the diagnostic quality of the existing facets assessments. This first 
round of results from both studies will provide a basis for redesign or improvement of facet 
questions to be tested with new groups of students during the second half of the project. 

Conclusions: 

A key contribution of this project is its application of rigorous methods of psychometric and 
evidentiary analyses to the existing Physics Diagnoser system and its extensive existing database 
of student performance. These analyses will logically and statistically test the degree of 
concordance of the existing system and the substantial facets system developmental foundations 
and design which were based on pragmatic, subject area instructional and pedagogical expertise 
and early literature on physics teaching and learning and on misconception research. The 
Alignment Study implements a new approach for a comprehensive analysis of a formative 
assessment system. The psychometric analyses apply new diagnostic and statistical approaches 
to a very large database of existing data. In concert these approaches are assembling multiple 
strands of evidence to build a powerful, theoretically and empirically grounded validity argument 
that directly impacts the pragmatic successes of the Physics Diagnoser system, and that is 
expected to lead to methods for improving the questions to be tested with students in the second 
half of the project. In sum, this project applies a rigorous comprehensive approach to 
understanding the cognitive, instructional and inferential underpinnings of the Physics Diagnoser 
system in light of the Diagnoser’s pragmatic classroom applications. 
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Appendix B. Tables and Figures 

Table 1 

Strands and Clusters in Diagnoser 



Strands 


Clusters 


Description of Motion 


1 . 


Position and Distance 




2. 


Change in Direction 




3. 


Determining Speed 




4. 


Average Speed 




5. 


Change in Speed 




6. 


Acceleration 


Nature of Forces 


1 . 


Identifying Forces 




2. 


Forces Acting at a Distance 




3. 


Forces as Interactions 




4. 


Gravitational Forces 




5. 


Magnetic Forces 




6. 


Electric Forces 




7. 


Electromagnetic 


Forces to Explain Motion 


1 . 


Effects of Pushes and Pulls 




2. 


Explaining Constant Speed 




3. 


Explaining Changes in ID Motion 




4. 


Explaining Changes in 2D Motion 



Source. www.Diagnoser.com 
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Table 2 

Research Questions and Approach for Alignmen t Study 



Research Questions 


Alignment to Internal or External Criteria or 
Other 


Type of 
Validity 


RQ1: To what degree do standards 
and facet clusters address the same 
content categories? 


External: Alignment of goal facets to 3 standards 
frameworks (Benchmarks for Scientific Literacy, 
College Board Standards objectives, and New 
Framework for Science Education core and 
component ideas 


Content 


RQ2: Do problematic facets represent 
the range of frequent 'misconceptions' 
and problematic ways of thinking? 


External: Alignment of problematic facets to 
misconceptions reflected in research 


Diagnostic; 

Cognitive 


RQ3: Does the ranking of problematic 
facets reflect an order of less to more 
problematic? 


Other: Judgment of the degree to which the 
ranking of the problematic facts reflect an order 
of less to more problematical 


Diagnostic; 

Cognitive 


RQ4: To what degree do Diagnoser 
questions align to facet clusters? (To 
what degree does the correct answer 
align to the goal facet? To what 
degree do the incorrect answers align 
to the problematic facets?) 


Internal: Alignment of each question to facet 
cluster 


Diagnostic; 

Cognitive; 

Indirect 

content 

validity* 


RQ5: How well do the Diagnoser 
questions align with the scientific 
practices identified in the College 
Board Standards? 


External: Alignment of each question to 
performance expectations in the College Board 
Standards for physical science and physics. 
(Performance expectations reflect the integration 
of practice with content specified in the 
associated objective.) 


Content 


RQ6: What degree of depth or 
complexity of knowledge do 
Diagnoser questions address? 


External: Alignment of each question to 
components reflecting depth of knowledge 
(declarative, procedural, schematic, or strategic) 


Cognitive 


RQ7: Do the questions for each facet 
cluster provide opportunities for 
students to demonstrate knowledge of 
the facets using a range of 
representations and different contexts? 


Other: Degree to which the assessment tasks 
cover a range of representations and different 
contexts 


Diagnostic 


RQ8: How well do the reports 
communicate student performance in 
ways that teachers understand and 
inform next instructional steps? 


Other: Judgment of the effectiveness of the 
Diagnoser question set reporting functionality for 
supporting formative use of the question sets 


Diagnostic 


RQ9: To what extent do the pathways 
through the questions in each 
Diagnoser Question Set represent a 
logical, conceptually appropriate 
sequence? 


Other: Judgment of the degree to which 
questions are presented in a logical, conceptually 
appropriate sequence in order to diagnose the 
misconceptions the students hold and remediate 
the misconceptions the students hold. 


Diagnostic 



*Indirect content validity signifies that if questions are aligned to facet cluster and facet clusters are aligned to 
content standards, we can infer that questions are aligned to content standards. 
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Facet Cluster - Forces as Interactions 

Facets and facet clusters are a framework for organizing the research on student conceptions so that it is 
understandable to both discipline experts and teachers. Facet clusters include the explicit learning goals in 
addition to various sorts of reasoning, conceptual and procedural difficulties. Each cluster contains the intuitive 
ideas students have as they move toward scientifically accurate learning targets 

Facets are arranged with the Goal Facets at the top of the page followed by the more problematic facets. Each 
facet has a two-digit number. The OX and IX facets are the learning targets. The facets that begin with the 
numbers 2X through 9X indicate ideas that have more problematic aspects. In general, the higher facet numbers 
(e g.. 9X. 8X : 7X) are the more problematic facets. The XO’s indicate more general statements of student ideas. 
Often these are followed by more specific examples, which are coded XI through X9. 

Forces as Interactions Facet Cluster 

00 The student understands that all forces arise out of an interaction between two objects and that these 
forces are equal in magnitude and opposite in direction. 

01 All forces arise out of an interaction between two objects. 

02 The force pairs are equal in magnitude 

03 The force pairs are opposite in direction. 



40 The student identifies equal force pairs, but indicates that both forces act on the same object. (For the 
example of a book at rest on a table, the gravitational force down on the book and the normal force up by 
the table on the book are identified as an action-reaction pair.) 

50 The student uses the effects of a force as an indication of the relative magnitudes of the forces in an 
interaction 

51 More damage indicates one of the interacting objects exerted a larger force. 

52 If an object is at rest, the interaction forces must be balanced 

53 If an object moves, the interaction forces must be unbalanced 

54 If an object accelerates, the interaction forces must be unbalanced 

60 The student indicates that the forces in a force pair do not have equal magnitude because the objects 
are dissimilar in some property (e g., bigger, stronger, faster). 

61 The 'stronger' object exerts a greater force. 

62 The moving object or a faster moving object exerts a greater force. 

63 The more active or energetic object exerts more force. 

64 The bigger or heavier object exerts more force. 



90 The student believes that inanimate/passive objects cannot exert a force. 



Source. www.Diagnoser.com 

Figure 1. Forces as Interactions Cluster 
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Question 1. Jennifer and Katie stand and lean on each other. Jennifer weighs 150 pounds and Katie 
weighs 120 pounds. Which one pushes harder on the other? 

(a) Katie must push harder because she weighs less and has to compensate for having 

less weight. [Facet 63]* 

(b) Jennifer and Katie push on each other with the same size force because force 

pairs are always equal. [Facet 02] 

(c) Jennifer pushes harder because she weighs more. [Facet 64] 

(d) It depends on whether Jennifer or Katie moves. [Facet 50] 

Question 2. In the text box below, describe who exerts the greater force in each of the following conditions AND 
why. 

1. If Katie moves 

2. If Jennifer moves 

3. If neither moves 

Question 3. In the picture, the book is at rest on the table. Which statement best describes the forces resulting from 
an interaction between objects? 




(a) The book pushes down on the table with the same force that the table pushes up on the book. [Facet 01] 

(b) The force by the table is equal and opposite to the force of gravity. [Facet 40] 

(c) The gravitational force is equal to the weight of the book. [Facet Unknown] 

(d) There are no forces due to an interaction in this situation since gravity is the only force acting on the book. 

[Facet 90] 

Question 4. Jared is trying to decide if he will be able to push his car home after it runs out of gas. Which of the 
following conclusions is most likely to be true? 

(a) If the car does move it is because Jared pushed harder on the car than the car pushed on him. [Facet 53] 

(b) The car will NOT move because it is heavier than Jared so the car pushes back on him harder than he can push 

on the car. [Facet 64] 

(c) The car will NOT move because the forces on the car are equal and opposite, so there is no net force. [Facet 40] 

(d) The car may move. The motion of the car depends only on the forces acting on the car, not the force of the car 

pushing back on Jared. [Facet 00] 

(e) Only if Jared is strong enough will he be able to push harder on the car than the car can push on him. [Facet 63] 

Question 5. Bob weighs 150 pounds (667 newtons). Which of the following statements best explains the 
gravitational force affecting Bob? 

(a) The Earth pulls on Bob with 150 pounds of gravitational force, but Bob pulls with less gravitational force on the 
earth since he has less mass. [Facet 64] 

(b) The Earth pulls on Bob with 150 pounds of gravitational force, and Bob pulls on the Earth with equal 
gravitational force since force pairs are equal. [Facet 02] 

(c) Since Bob will fall "toward the Earth" if he steps off a chair, the Earth's gravity must pull on him more than he 
pulls on the Earth. [Facet 53] 

(d) The gravitational force acting on Bob must be more than the gravitational force he exerts on the Earth since Bob 
is the one who accelerates as he falls. [54] 




Figure 2. Forces as Interactions Question Set 1 
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Question 6. Sarah plays defensive back on her school's soccer team. At practice she kicks the ball that was rolling 
toward her to the other end of the field. Which statement describes the force by the ball acting on Sarah's foot during 
the kick? 

(a) The ball does not exert a force on Sarah's foot. [Paired] with question: 7 

(b) The force by the ball is less than the force of Sarah's kick. [Paired] with question: 7 

(c) The force by the ball is equal to the force of Sarah's kick. [Paired] with question: 7 

(d) The force by the ball is greater than the force of Sarah's kick. [Paired] with question: 7 

Question 7. Which reason best fits your answer to the previous question? 

(a) Sarah is stronger than the ball. 

(b) Sarah's kick made the ball move, but the ball did not move Sarah. 

(c) Only Sarah can exert a force; the ball is not alive. 

(d) All interacting objects exert equal forces on each other. 

Combination of responses to Questions 6 and 7 linked to facets as follows: 

(a:a) Sarah is stronger than the ball. [Facet 90][Facet 61] 

(a:b) Sarah's kick made the ball move, but the ball did not move Sarah. [Facet 90][Facet 53] 

(a:c) Only Sarah can exert a force; the ball is not alive. [Facet 90][Facet 90] 

(a:d) All interacting objects exert equal forces on each other. [Facet 90][Facet 01] 

(a:e) The ball hurt Sarah's foot more than she hurt the ball. [Facet 90][Facet 51] 

(b:a) Sarah is stronger than the ball. [Facet 61][Facet 61] 

(b:b) Sarah's kick made the ball move, but the ball did not move Sarah. [Facet 53][Facet 53] 

(b:c) Only Sarah can exert a force; the ball is not alive. [Facet 60][Facet 90] 

(b:d) All interacting objects exert equal forces on each other. [Facet 60][Facet 01] 

(b:e) The ball hurt Sarah's foot more than she hurt the ball. [Facet 60][Facet 51] 

(c:a) Sarah is stronger than the ball. [Facet Unknown] [Facet 61] 

(c:b) Sarah's kick made the ball move, but the ball did not move Sarah. [Facet Unknown][Facet 53] 

(c:c) Only Sarah can exert a force; the ball is not alive. [Unknown][Facet 90] 

(c:d) All interacting objects exert equal forces on each other. [Facet 02][Facet 01] 

(c:e) The ball hurt Sarah's foot more than she hurt the ball. [Facet Unknown][Facet 51] 

(d:a) Sarah is stronger than the ball. [Facet 50][Facet 61] 

(d:b) Sarah's kick made the ball move, but the ball did not move Sarah. [Facet 50][Facet 53] 

(d:c) Only Sarah can exert a force; the ball is not alive. [Facet 50][Facet 90] 

(d:d) All interacting objects exert equal forces on each other. [Facet 50][Facet 01] 

(d:e) The ball hurt Sarah's foot more than she hurt the ball. [Facet 50][Facet 51] 

(a:f) The ball was moving when Sarah kicked the ball. [Facet 90][Facet 62] 

(b:f) The ball was moving when Sarah kicked the ball. [Facet Unknown][Facet 62] 

(c:f) The ball was moving when Sarah kicked the ball. [Facet Unknown] [Facet 62] 

(d:f) The ball was moving when Sarah kicked the ball. [Facet 62][Facet 62] 

Question 8. When two objects interact and exert forces on each other, how can you tell which object exerts more 
force? Explain your answer in the space below. 



Source. www.Diagnoser.com 

* The facet codes do not appear in the questions shown to students, but are the facets diagnosed for the teacher for 
each response. See Figure 1 for Forces as Interactions facet cluster. 

Figure 2. Forces as Interactions Question Set 1 (continued) 
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